Path Consistency Learning in Tsallis Entropy Regularized MDPs

نویسندگان

Ofir Nachum

Yinlam Chow

Mohammad Ghavamzadeh

چکیده

We study the sparse entropy-regularized reinforcement learning (ERL) problem in which the entropy term is a special form of the Tsallis entropy. The optimal policy of this formulation is sparse, i.e., at each state, it has non-zero probability for only a small number of actions. This addresses the main drawback of the standard Shannon entropy-regularized RL (soft ERL) formulation, in which the optimal policy is softmax, and thus, may assign a non-negligible probability mass to non-optimal actions. This problem is aggravated as the number of actions is increased. In this paper, we follow the work of Nachum et al. (2017) in the soft ERL setting, and propose a class of novel path consistency learning (PCL) algorithms, called sparse PCL, for the sparse ERL problem that can work with both on-policy and off-policy data. We first derive a sparse consistency equation that specifies a relationship between the optimal value function and policy of the sparse ERL along any system trajectory. Crucially, a weak form of the converse is also true, and we quantify the sub-optimality of a policy which satisfies sparse consistency, and show that as we increase the number of actions, this suboptimality is better than that of the soft ERL optimal policy. We then use this result to derive the sparse PCL algorithms. We empirically compare sparse PCL with its soft counterpart, and show its advantage, especially in problems with a large number of actions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A unified view of entropy-regularized Markov decision processes

We propose a general framework for entropy-regularized average-reward reinforcement learning in Markov decision processes (MDPs). Our approach is based on extending the linear-programming formulation of policy optimization in MDPs to accommodate convex regularization functions. Our key result is showing that using the conditional entropy of the joint state-action distributions as regularization...

متن کامل

Tsallis Entropy and Conditional Tsallis Entropy of Fuzzy Partitions

The purpose of this study is to define the concepts of Tsallis entropy and conditional Tsallis entropy of fuzzy partitions and to obtain some results concerning this kind entropy. We show that the Tsallis entropy of fuzzy partitions has the subadditivity and concavity properties. We study this information measure under the refinement and zero mode subset relations. We check the chain rules for ...

متن کامل

A non-extensive maximum entropy based regularization method for bad conditioned inverse problems

A regularization method based on the non-extensive maximum entropy principle is devised. Special emphasis is given to the q = 1=2 case. We show that, when the residual principle is considered as constraint, the q = 1=2 generalized distribution of Tsallis yields a regularized solution for bad-conditioned problems. The so devised regularized distribution is endowed with a component which correspo...

متن کامل

A Preferred Definition of Conditional Rényi Entropy

The Rényi entropy is a generalization of Shannon entropy to a one-parameter family of entropies. Tsallis entropy too is a generalization of Shannon entropy. The measure for Tsallis entropy is non-logarithmic. After the introduction of Shannon entropy , the conditional Shannon entropy was derived and its properties became known. Also, for Tsallis entropy, the conditional entropy was introduced a...

متن کامل

A note on inequalities for Tsallis relative operator entropy

‎In this short note‎, ‎we present some inequalities for relative operator entropy which are generalizations of some results obtained by Zou [Operator inequalities associated with Tsallis relative operator entropy‎, ‎{em Math‎. ‎Inequal‎. ‎Appl.}‎ ‎{18} (2015)‎, ‎no‎. ‎2‎, ‎401--406]‎. ‎Meanwhile‎, ‎we also show some new lower and upper bounds for relative operator entropy and Tsallis relative o...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

CoRR

دوره abs/1802.03501 شماره

صفحات -

تاریخ انتشار 2018

Path Consistency Learning in Tsallis Entropy Regularized MDPs

نویسندگان

چکیده

منابع مشابه

A unified view of entropy-regularized Markov decision processes

Tsallis Entropy and Conditional Tsallis Entropy of Fuzzy Partitions

A non-extensive maximum entropy based regularization method for bad conditioned inverse problems

A Preferred Definition of Conditional Rényi Entropy

A note on inequalities for Tsallis relative operator entropy

عنوان ژورنال:

اشتراک گذاری